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Amendments to the Claims: 

This listing of claims will replace all prior versions, and listings of claims in the 
application. Applicants have submitted a new complete claim set showing any marked 
up claims with insertions indicated by underlining and deletions indicated by strikeouts 
and/or double bracketing. 

Listing of Claims: 

1 . (Currently Amended) A process to evaluate an input string to segment said string into 
component parts comprising: 

providing a state transition model based on an existing collection of data records 
that includes probabilities to segment input strings into component parts which adjusts 
said probabilities to account for erroneous token placement in the input string, wherein 
the existing collection of data records does not comprise manually segmented training 
data; 

determining a most probable segmentation of the input string by comparing 
tokens that make up the input string with a state transition model derived from the 
collection of data records; 

segmenting the input string into one or more component parts according to the 
most probable segmentation; and 

storing the one or more component parts in a database on a computer system . 

2. (Original) The process of claim 1 wherein the state transition model has probabilities 
for multiple states of said model and a most probable segmentation is determined based 
on a most probable token emission path through different states of the state transition 
model from a beginning state to an end state. 

3. (Currently Amended) The process of claim 1 wherein the collection of data 
records is stored in a database relation and an order of attributes for the database relation 
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4. (Original) The process of claim 3 wherein the input string is segmented into sub- 
components which correspond to attributes of the database relation. 

5. (Original) The process of claim 4 wherein the tokens are substrings of said input 
string. 

6. (Original) The process of claim 5 wherein the input string is to be segmented into 
database attributes and wherein each attribute has a state transition model based on the 
contents of the database relation. 

7. (Original) The process of claim 6 wherein the state transition model has multiple 
states for a beginning, middle and trailing position within an input string. 

8. (Original) The process of claim 6 wherein the state transition model has probabilities 
for the states and a most probable segmentation is determined based on a most probable 
token emission path through different states of the state transition model from a 
beginning state to an end state. 

9. (Original) The process of claim 5 wherein input attribute order for records to be 
segmented is known in advance of segmentation of an input string. 

10. (Original) The process of claim 5 wherein an attribute order is learned from a batch 
of records that are inserted into the table. 

1 1 . (Original) The process of claim 6 wherein the state transition model has at least some 
states corresponding to base tokens occurring in the reference relation. 

12. (Original) The process of claim 6 wherein the state transition model has class states 
corresponding to token patterns within said reference relation. 
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13. (Previously Presented) The process of claim 8 wherein the state transition model 
includes states that account for missing, misordered and inserted tokens within an 
attribute. 

14. (Original) The process of claim 13 wherein the state transition model has a 
beginning, a middle and a trailing state topology and the process of accounting for 
misordered and inserted tokens is performed by copying states from one of said 
beginning, middle or trailing states into another of said beginning, middle or trailing 
states. 

15. (Previously Presented) A machine computer readable medium containing 
instructions to perform the process of claim 1 . 

16. (Currently Amended) A process for segmenting strings into component parts 
comprising: 

providing a reference table of string records that are segmented into multiple 
substrings corresponding to database attributes, wherein the reference table of string 
records does not comprise manually segmented training data; 

analyzing the substrings within an attribute to provide a state model that assumes 
a beginning, a middle and a trailing token topology for said attribute,,, said topology 
including a null token for an empty attribute component; 

breaking an input record into a sequence of tokens; 

determining a most probable segmentation of the input record by comparing the 
tokens of the input record with state models derived for attributes from the reference 
table; 

segmenting the input record into one or more component parts according to the 
most probable segmentation; and 

storing the one or more component parts in a database on a computer system . 
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17. (Currently Amended) A computer system for processing input strings to segment 
those records for inclusion into a database comprising: 

a) a database management system to store records organized into relations 
wherein data records within a relation are organized into a number of attributes; 

b) a model building component on the computer system that builds a number of 
attribute recognition models based on an existing relation of data records, wherein the 
existing relation of data records does not comprise manually segmented training data, 
wherein one or more of said attribute recognition models includes probabilities for 
segmenting input strings into component parts which adjusts said probabilities to account 
for erroneous entries within an input string; and 

c) a segmenting component on the computer system that receives an input string 
and determines a most probable record segmentation by evaluating transition 
probabilities of states within the attribute recognition models built by the model building 
component. 

18. (Original)The system of claim 17 wherein the segmenting component receives a 
batch of evaluation strings and determines an attribute order of strings in said batch and 
thereafter assumes the input string has tokens in the same attribute order as the evaluation 
strings. 

19. (Original) The system of claim 18 wherein the segmenting component evaluates the 
tokens in an order in which they are contained in the input string and considers state 
transitions from multiple attribute recognition models to find a maximum probability for 
the state of a token to provide a maximum probability for each token in said input string. 

20. (Original) The system of claim 17 wherein the model building component assigns 
states for each attribute for a beginning, middle and trailing token position and wherein 
the model building component relaxes token acceptance by the model by copying states 
among said beginning, middle and trailing token positions. 
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21. (Original) The system of claim 20 wherein the model building component defines a 
start and end state for each model and accommodates missing attributes by assigning a 
probability for a transition from the start to the end state. 

22-24. (Cancelled). 

25. (Currently Amended) A The process of claim 1 segmenting a string input record into 
a sequence of attributes for inclusion into a database table comprising: wherein 
determining a most probable segmentation of the input string comprises 

considering a first token in a string input string record and determining a 
maximum state probability for said token based on state transition models for multiple 
data table attributes- i, and 

considering in turn subsequent tokens in the string input record and determining 
maximum state probabilities for said subsequent tokens from a previous token state until 
all tokens are considered^ and 

wherein segmenting the input string comprises segmenting the input string record 
by assigning the tokens of the input string to attribute states of the state transition models 
corresponding to said maximum state probabilities, wherein the state transition models 
are based on aft the existing collection of data records that do not comprise manually 
segmented training data. 

26. (Currently Amended) The process of claim 25 additionally further comprising 
determining an attribute order for a batch of string input string records and using the 
order to limit the possible state probabilities when evaluating tokens in an the input 
string. 

27. (Currently Amended) A system for evaluating an input string to segment said input 
string into component parts comprising: 

means for providing a state transition model based on an existing collection of 
data records that includes probabilities to segment input strings into component parts 
which adjusts said probabilities to account for erroneous token placement in the input 
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string, wherein the existing collection of data records does not comprise manually 
segmented training data; 

means for determining a most probable segmentation of the input string by 
comparing an order of tokens that make up the input string with a state transition model 
derived from the collection of data records; 

means for segmenting the input string into one or more component parts 
according to the most probable segmentation; and 

means for storing the one or more component parts in a database on a computer 

system . 

28. (Original) The system of claim 27 wherein the state transition model has 
probabilities for multiple states of said model and a most probable segmentation is 
determined based on a most probable token emission path through different states of the 
state transition model from a beginning state to an end state. 

29. (Previously Presented) The system of claim 27 additionally including means for 
maintaining a collection of records, wherein the collection of records is stored in a 
database relation. 

30. (Original) The system of claim 29 wherein the input record is segmented into sub- 
components which correspond to attributes of the database relation. 

3 1 . (Original) The system of claim 30 wherein the tokens are substrings of said input 
string. 

32. (Original) The system of claim 30 wherein the input string is to be segmented into 
database attributes and wherein each attribute has a state transition model based on the 
contents of the database relation. 

33. (Original) The system of claim 32 wherein the state transition model has multiple 
states for a beginning, middle and trailing position within an input string. 
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34. (Original) The system of claim 32 wherein the state transition model has 
probabilities for the states and a most probable segmentation is determined based on a 
most probable state path through different states of the state transition model from a 
beginning state to an end state. 
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